R workflow integrating models in computer vision and statistical ecology: Trade-offs between deep learning for species identification and infering spatial co‐occurrenceBlabla.
Computer vision is a field of artificial intelligence in which a machine is taught how to extract and interpret the content of an image (Krizhevsky, Sutskever, and Hinton 2012). Computer vision relies on deep learning that allows computational models to learn from training data – a set of manually labelled images – and make predictions on new data – a set of unlabelled images (Baraniuk, Donoho, and Gavish 2020; LeCun, Bengio, and Hinton 2015). With the growing availability of massive data, computer vision with deep learning is being increasingly used to perform important tasks such as object detection, face recognition, action and activity recognition or human pose estimation in fields as diverse as medicine, robotics, transportation, genomics, sports and agriculture (Voulodimos et al. 2018).
In ecology in particular, there is a growing interest in deep learning for automatizing repetitive analyses on large amount of images, such as identifying plant and animal species, distinguishing individuals of the same or different species, counting individuals or detecting relevant features (Christin, Hervet, and Lecomte 2019; Lamba et al. 2019; Weinstein 2018). By saving hours of manual data analyses and tapping into massive amounts of data that keep accumulating with technological advances, deep learning has the potential to become an essential tool for ecologists and applied statisticians.
Despite the promising future of computer vision and deep learning, there are challenging issues toward their wide adoption by the community of ecologists (Wearn, Freeman, and Jacoby 2019). First, there is a programming barrier as most, if not all, algorithms are written in the Python language while most ecologists are better versed in R (Lai et al. 2019). If ecologists are to use computer vision in routine, there is a need for bridges between these two languages (through, e.g., the reticulate package Allaire et al. (2017) or the shiny package Tabak et al. (2020)). Second, most recent applications of computer vision via deep learning in ecology (short WoS review and Table?) have focused on computational aspects and simple tasks without addressing the underlying ecological questions (Sutherland et al. 2013), or carrying out the statistical data analysis (Gimenez et al. 2014). Although perfectly understandable given the challenges at hand, we argue that a better integration of the why (ecological questions), the what (data) and the how (statistics) would be beneficial to computer vision for ecology (see also Weinstein 2018). (Develop here, speak about tradeoffs, and relevance)
Here, we showcase a full why-what-how workflow in R using a case study on elucidating the structure of an ecological community (a set of co-occurring species), namely that of the Eurasian lynx (Lynx lynx) and its main preys. First, we introduce the case study and motivate the need for deep learning. Second we illustrate deep learning for the identification of animal species in large amounts of images, including model training and validation with a dataset of labelled images, and prediction with a new dataset of unlabelled images. Last, we proceed with the quantification of spatial co-occurrence using statistical models. (Main conclusion no need to go too far in the DL to get reasonable answer to ecological question) We hope that our reproducible workflow will be useful to ecologists and applied statisticians.
Lynx (Lynx lynx) went extinct in France at the end of the 19th century due to habitat degradation, human persecution and decrease in prey availability (Vandel and Stahl 2005). The species was reintroduced in Switzerland in the 1970s (Breitenmoser 1998), then re-colonised France through the Jura mountains in the 1980s (Vandel and Stahl 2005). The species is listed as endangered under the 2017 IUCN Red list and is of conservation concern in France due to habitat fragmentation, poaching and collisions with vehicles. The Jura holds the bulk of the French lynx population.
To better understand its distribution, we need to quantify its interactions with its main preys, roe deer (Capreolus capreolus) and chamois (Rupicapra rupicapra) (Molinari-Jobin et al. 2007), two ungulate species that are also hunted. To assess the relative contribution of predation and hunting, a predator-prey program was set up jointly by the French Office for Biodiversity, the Federations of Hunters from the Jura, Ain and Haute-Savoie counties and the French National Centre for Scientific Research.
Animal detections were made using a set of camera traps in the Jura mountains that were deployed in the Jura and Ain counties (see Figure 1). We divided the two study areas into grids of 2.7 \(\times\) 2.7 km cells or sites hereafter (Zimmermann et al. 2013) in which we set two camera traps per site (Xenon white flash with passive infrared trigger mechanisms, model Capture, Ambush and Attack; Cuddeback), with 18 sites in the Jura study area, and 11 in the Ain study area that were active over the study period (from February 2016 to October 2017 for the Jura county, and from February 2017 to May 2019 for the Ain county). Camera traps were checked weekly to change memory cards, batteries and to remove fresh snow after heavy snowfall.
Figure 1: Study area, grid and camera trap locations.
In total, 45563 and 18044 pictures were considered in the Jura and Ain sites respectively after manually droping empty pictures. We identified the species present on all images by hand (see Table 1) using digiKam a free open-source digital photo management application (https://www.digikam.org/). This operation took several weeks of labor full time, which is often identified as a limitation for camera trap studies. Computer vision with deep learning has been identified as a promising approach to expedite this tedious task (Norouzzadeh et al. 2021; Tabak et al. 2019; Willi et al. 2019).
| Species in Jura study site | n | Species in Ain study site | n |
|---|---|---|---|
| humain | 31644 | humain | 8931 |
| véhicule | 5637 | vehicule | 2390 |
| chien | 2779 | cavalier | 1206 |
| renard | 2088 | chevreuil | 1101 |
| chamois | 919 | chien | 1057 |
| sanglier | 522 | renard | 922 |
| blaireau | 401 | sanglier | 643 |
| chevreuil | 368 | blaireau | 577 |
| chat | 343 | chasseur | 368 |
| lynx | 302 | lynx | 203 |
I used transfer learning to fine-tune a pre-trained CNN (resnet50) using the annotated pictures from the Jura site. Then I compared the predictions from my new model for the pictures from the Ain site with the manual annotations for these pictures. Transfer learning was achieved with GPU machines.
We use the fastai package that provides R wrappers to fastai. The fastai library simplifies training of CNNs.
Expliquer principe général et les étapes ci-dessous. Below with CPU for reproducibility, subsample of picture datasets, only a few for automatic tagging. But results are proovided with GPU, more epochs and all pictures. Fully-trained model, all pictures, provided via Zenodo.
C’est là qu’entre en jeu le deep learning, de plus en plus utilisé en écologie, voir par exemple Christin, Hervet, and Lecomte (2019). L’idée est de nourrir les algorithmes avec des photos en entrée pour en sortie récupérer l’espèce qui se trouve sur la photo. Nous avons utilisé la librairie fast-ai qui repose sur le language Python et sa librairie Pytorch. Un avantage de cette librairie est qu’elle vient avec un package R fastai qui propose plusieurs fonctions pour l’utiliser.
Quels sont les résultats obtenus? Nous avons d’abord fait du transfer learning sur un site d’étude dans le Jura où nous avions des photos déjà étiquetées. Nous avons utilisé un modèle resnet50 déjà pré-entrainé. Nous arrivons à classifier le lynx, et ses proies, le chamoix et le chevreuil, avec un degré de certitude satisfaisant.
Ensuite, nous avons utilisé le modèle pour étiqueter automatiquement des photos prises avec des pièges installés sur un autre site, dans l’Ain. Ces photos ont aussi été étiquetées à la main, on connait donc la vérité.
Les modèles d’occupation, tels que celui de Rota et al. (2016), supposent que l’aire d’étude est fermée, que l’état d’occupation d’un site ne change pas au cours de la période d’étude et que les sites sont indépendants. Pour s’assurer de respecter les hypothèses d’application du modèle, nous découpons les données de détection en trois saisons basées sur la biologie du lynx. La première saison est la période hivernale. Elle s’étend du 1er octobre 2016 au 31 janvier 2017. A cette période les femelles sont mobiles et sont suivies de leurs jeunes en apprentissage. La deuxième saison est la période printanière qui dure du 1er février au 31 mai 2017. Au cours de cette saison l’activité du lynx est la plus élevée. Les jeunes de l’année passée se séparent de leur mère et dispersent entre janvier et avril. Ils sont alors à la recherche d’un territoire. De février à mi-avril, la période de reproduction a lieu. Puis les femelles s’isolent pour mettre bas. Finalement lors de la troisième période (estivale), du 1er juin au 30 septembre 2017, les femelles mettent bas après une période de 67 à 70 jours. Pendant cette période et jusqu’à 6 à 9 semaines après la naissance des jeunes, la mère reste sédentaire. Les femelles sont, dès lors, suivies des jeunes en apprentissage (Breitenmoser-Würsten et al. 2007a; Drouilly 2019).
Pour chaque saison, les photos brutes sont transformées en données de détection. Plus précisément, nous construisons une matrice de détection par espèce, composée de 0 (non-détection de l’espèce) et de 1 (détection de l’espèce). Les lignes représentent les pièges et les colonnes les occasions de capture. La capture est ici la prise d’une image par un piège photographique. Dans notre cas, l’occasion de capture est définie comme une semaine (Gimenez et al. 2019). Lorsque les pièges ne sont pas actifs à certaines occasions de capture, nous spécifions que les données ne sont pas disponibles, et les semaines de capture de ces pièges ne sont pas prises en compte dans l’analyse. Nous évitons ainsi de confondre la non-collecte de données avec une non-détection des espèces. De plus, lorsqu’un piège ne capture pas de lynx pendant une longue période, il est déplacé ou supprimé et change alors de nom. Ces changements ne posent pas de problèmes lors de l’analyse, puisque celle-ci n’est pas spatialisée et que chaque piège est supposé indépendant des autres.
Sur la base du nombre de faux négatifs (une photo sur laquelle on a un lynx mais on prédit une autre espèce) et de faux positifs (une photo sur laquelle on n’a pas de lynx mais on prédit qu’il y en a un), les résultats sont peu satisfaisants. Toutefois, la question est de savoir si le manque de précision nuit à l’inférence des interactions prédateur-proie. Pour ce faire, on a utilisé des modèles statistiques qui permettent d’inférer les co-occurrences entre espèces en tenant compte de la difficulté de les détecter sur le terrain. Ce sont les modèles d’occupancy développés par Rota et al. (2016) et implémentés dans R par Fiske and Chandler (2011).
On obtient les probabilités de présence du lynx, conditionnellement à la présence ou absence de ses proies (Figure 1). Il y a un léger biais dans l’estimation de la probabilité de présence du lynx sachant la présence de ses deux proies favorites quand on se fie à l’étiquetage automatique des photos. Etant donné que les différences ne sont pas énormes, l’écologue pourra décider de les ignorer au regard du temps gagné par rapport à un étiquetage à la main. Maintenant le biais est plus important sur la probabilité de présence du lynx sachant la présence du chevreuil et l’absence du chamois qui elle est sous-estimée.
Probabilités de présence du lynx, conditionnellement à la présence ou absence de ses proies. En rouge, avec les photos étiquetées à la main. En gris-bleu, avec les photos étiquetées automatiquement.
En conclusion, l’utilisation d’un modèle entrainé sur un site pour prédire sur un autre site est délicate. Il est facile de se perdre dans les dédales du deep learning, mais il faut garder le cap de la question écologique, et on peut accepter des performances moyennes des algorithmes si le biais engendré sur les indicateurs écologiques est faible. Malgré tout, on peut faire mieux, et nous développons actuellement des modèles de distribution d’espèce qui prendrait à la fois en compte les interactions et les faux positifs et faux négatifs. Pour aller plus loin avec le deep learning et l’analyse d’images, nous renvoyons vers Miele, Dray, and Gimenez (2021).
Summarise what we did.
Main lessons.
To be extended.
Ongoing work to include covariates.
The lynx (Lynx lynx, Linné 1758) is yet again present into the Jura Mountains since the 80’s. In order to sustain his return into a functional ecosystem, we need to understand the factors that affect lynx and his preys distribution, the roe deer (Capreolus capreolus, Linné 1758) and the chamois (Rupicapra rupicapra, Linné 1758). How do environmental variables (forest cover, human disturbance) affect lynx and his preys presence and their co-occurrence in the French Jura ? What is the relative contribution of habitat preferences and prey-predator relationships ? To answer these questions, we used the multi-species occupancy model developed by Rota et al. (2016) which accounts for species imperfect detection. Thanks to this model, we quantified the lynx presence in function of environmental variables and the presence and absence of his preys, by using data from a non invasive camera trapping monitoring protocol. We show that the lynx and his preys presence is equally influenced by forest cover and the presence of lynx, chamois or roe deer. Therefore, we need to account for interactions between species in relation with habitat quality in inferring the occupancy of these species.
For much of the last century, ecologists have typically interpreted the diversity and composition of communities as the outcome of local-scale processes, both biotic (e.g. competition and predation) and abiotic (e.g. temperature and nutrients).
Some of the most challenging questions in ecology concern communities: sets of co-occurring species.
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] exifr_0.3.1 fastai_2.0.4 forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
## [9] tibble_3.1.3 ggplot2_3.3.5 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 sass_0.3.1.9001 jsonlite_1.7.2
## [4] carData_3.0-4 modelr_0.1.8 bslib_0.2.4
## [7] assertthat_0.2.1 highr_0.9 cellranger_1.1.0
## [10] yaml_2.2.1 pillar_1.6.1 backports_1.2.1
## [13] lattice_0.20-41 glue_1.4.2 reticulate_1.20
## [16] digest_0.6.27 ggsignif_0.6.1 rvest_1.0.0
## [19] colorspace_2.0-2 htmltools_0.5.1.9002 Matrix_1.3-2
## [22] plyr_1.8.6 pkgconfig_2.0.3 broom_0.7.6
## [25] haven_2.3.1 scales_1.1.1 openxlsx_4.2.3
## [28] rio_0.5.26 generics_0.1.0 car_3.0-10
## [31] ellipsis_0.3.2 ggpubr_0.4.0 withr_2.4.2
## [34] cli_3.0.1 magrittr_2.0.1 crayon_1.4.1
## [37] readxl_1.3.1 evaluate_0.14 fs_1.5.0
## [40] fansi_0.5.0 foreign_0.8-81 rstatix_0.7.0
## [43] xml2_1.3.2 data.table_1.14.0 tools_4.0.2
## [46] hms_1.1.0 lifecycle_1.0.0 munsell_0.5.0
## [49] reprex_1.0.0 zip_2.1.1 compiler_4.0.2
## [52] jquerylib_0.1.3 rlang_0.4.11.9001 grid_4.0.2
## [55] rstudioapi_0.13 rappdirs_0.3.3 rmarkdown_2.7
## [58] gtable_0.3.0 codetools_0.2-18 abind_1.4-5
## [61] DBI_1.1.1 curl_4.3.2 R6_2.5.0
## [64] lubridate_1.7.10 knitr_1.33 fastmap_1.1.0
## [67] utf8_1.2.2 stringi_1.6.2 Rcpp_1.0.7
## [70] vctrs_0.3.8 png_0.1-7 dbplyr_2.1.0
## [73] tidyselect_1.1.1 xfun_0.24
ANR. Folks who have labeled pix if not co-authors. MBB folks. Vincent Miele for his help along the way, and being an inspiration.
Allaire, JJ, Kevin Ushey, Yuan Tang, and Dirk Eddelbuettel. 2017. Reticulate: R Interface to Python. https://github.com/rstudio/reticulate.
Baraniuk, Richard, David Donoho, and Matan Gavish. 2020. “The Science of Deep Learning.” Proceedings of the National Academy of Sciences 117 (48): 30029–32. https://doi.org/10.1073/pnas.2020596117.
Breitenmoser, Urs. 1998. “Large Predators in the Alps: The Fall and Rise of Man’s Competitors.” Biological Conservation, Conservation Biology and Biodiversity Strategies, 83 (3): 279–89. https://doi.org/10.1016/S0006-3207(97)00084-0.
Christin, Sylvain, Éric Hervet, and Nicolas Lecomte. 2019. “Applications for Deep Learning in Ecology.” Edited by Hao Ye. Methods in Ecology and Evolution 10 (10): 1632–44. https://doi.org/10.1111/2041-210X.13256.
Fiske, Ian, and Richard Chandler. 2011. “unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance.” Journal of Statistical Software 43 (10): 1–23.
Gimenez, Olivier, Stephen T. Buckland, Byron J. T. Morgan, Nicolas Bez, Sophie Bertrand, Rémi Choquet, Stéphane Dray, et al. 2014. “Statistical Ecology Comes of Age.” Biology Letters 10 (12): 20140698. https://doi.org/10.1098/rsbl.2014.0698.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems 25, edited by F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, 1097–1105. Curran Associates, Inc.
Lahoz-Monfort, José J, and Michael J L Magrath. 2021. “A Comprehensive Overview of Technologies for Species and Habitat Monitoring and Conservation.” BioScience. https://doi.org/10.1093/biosci/biab073.
Lai, Jiangshan, Christopher J. Lortie, Robert A. Muenchen, Jian Yang, and Keping Ma. 2019. “Evaluating the Popularity of R in Ecology.” Ecosphere 10 (1). https://doi.org/10.1002/ecs2.2567.
Lamba, Aakash, Phillip Cassey, Ramesh Raja Segaran, and Lian Pin Koh. 2019. “Deep Learning for Environmental Conservation.” Current Biology 29 (19): R977–R982. https://doi.org/10.1016/j.cub.2019.08.016.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.
Miele, Vincent, Stéphane Dray, and Olivier O. Gimenez. 2021. “Images, écologie et deep learning.” Regards sur la biodiversité, February. https://hal.archives-ouvertes.fr/hal-03142486.
Miele, Vincent, Gaspard Dussert, Bruno Spataro, Simon Chamaillé‐Jammes, Dominique Allainé, and Christophe Bonenfant. 2021. “Revisiting Animal Photo‐identification Using Deep Metric Learning and Network Analysis.” Edited by Robert Freckleton. Methods in Ecology and Evolution 12 (5): 863–73. https://doi.org/10.1111/2041-210X.13577.
Molinari-Jobin, Anja, Fridolin Zimmermann, Andreas Ryser, Christine Breitenmoser-Würsten, Simon Capt, Urs Breitenmoser, Paolo Molinari, Heinrich Haller, and Roman Eyholzer. 2007. “Variation in Diet, Prey Selectivity and Home-Range Size of Eurasian Lynx Lynx Lynx in Switzerland.” Wildlife Biology 13 (4): 393–405. https://doi.org/10.2981/0909-6396(2007)13[393:VIDPSA]2.0.CO;2.
Norouzzadeh, Mohammad Sadegh, Dan Morris, Sara Beery, Neel Joshi, Nebojsa Jojic, and Jeff Clune. 2021. “A Deep Active Learning System for Species Identification and Counting in Camera Trap Images.” Edited by Matthew Schofield. Methods in Ecology and Evolution 12 (1): 150–61. https://doi.org/10.1111/2041-210X.13504.
Rota, Christopher T., Marco A. R. Ferreira, Roland W. Kays, Tavis D. Forrester, Elizabeth L. Kalies, William J. McShea, Arielle W. Parsons, and Joshua J. Millspaugh. 2016. “A Multispecies Occupancy Model for Two or More Interacting Species.” Methods in Ecology and Evolution 7 (10): 1164–73.
Sutherland, William J., Robert P. Freckleton, H. Charles J. Godfray, Steven R. Beissinger, Tim Benton, Duncan D. Cameron, Yohay Carmel, et al. 2013. “Identification of 100 Fundamental Ecological Questions.” Edited by David Gibson. Journal of Ecology 101 (1): 58–67. https://doi.org/10.1111/1365-2745.12025.
Tabak, Michael A., Mohammad S. Norouzzadeh, David W. Wolfson, Erica J. Newton, Raoul K. Boughton, Jacob S. Ivan, Eric A. Odell, et al. 2020. “Improving the Accessibility and Transferability of Machine Learning Algorithms for Identification of Animals in Camera Trap Images: MLWIC2.” Ecology and Evolution 10 (19): 10374–83. https://doi.org/10.1002/ece3.6692.
Tabak, Michael A., Mohammad S. Norouzzadeh, David W. Wolfson, Steven J. Sweeney, Kurt C. Vercauteren, Nathan P. Snow, Joseph M. Halseth, et al. 2019. “Machine Learning to Classify Animal Species in Camera Trap Images: Applications in Ecology.” Edited by Theoni Photopoulou. Methods in Ecology and Evolution 10 (4): 585–90. https://doi.org/10.1111/2041-210X.13120.
Vandel, Jean-Michel, and Philippe Stahl. 2005. “Distribution Trend of the Eurasian Lynx Lynx Lynx Populations in France.” Mammalia 69 (2). https://doi.org/10.1515/mamm.2005.013.
Voulodimos, Athanasios, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios Protopapadakis. 2018. “Deep Learning for Computer Vision: A Brief Review.” Edited by Diego Andina. Computational Intelligence and Neuroscience 2018 (February): 7068349. https://doi.org/10.1155/2018/7068349.
Wearn, Oliver R., Robin Freeman, and David M. P. Jacoby. 2019. “Responsible AI for Conservation.” Nature Machine Intelligence 1 (2): 72–73. https://doi.org/10.1038/s42256-019-0022-7.
Weinstein, Ben G. 2018. “A Computer Vision for Animal Ecology.” Edited by Laura Prugh. Journal of Animal Ecology 87 (3): 533–45. https://doi.org/10.1111/1365-2656.12780.
Willi, Marco, Ross T. Pitman, Anabelle W. Cardoso, Christina Locke, Alexandra Swanson, Amy Boyer, Marten Veldthuis, and Lucy Fortson. 2019. “Identifying Animal Species in Camera Trap Images Using Deep Learning and Citizen Science.” Edited by Oscar Gaggiotti. Methods in Ecology and Evolution 10 (1): 80–91. https://doi.org/10.1111/2041-210X.13099.
Zimmermann, Fridolin, Christine Breitenmoser-Würsten, Anja Molinari-Jobin, and Urs Breitenmoser. 2013. “Optimizing the Size of the Area Surveyed for Monitoring a Eurasian Lynx (Lynx Lynx) Population in the Swiss Alps by Means of Photographic Capture-Recapture.” Integrative Zoology 8 (3): 232–43. https://doi.org/10.1111/1749-4877.12017.